Lesson 7: Printer Friendly

Transforming XML With XSLT

Printing This Lesson

Select what you’d like to include when you print, and then click the Print Lesson button:

Saving This Lesson

For instructions on saving this lesson (shown below), please select the browser you're using.

chrome icon
Chrome
Firefox icon
Firefox
Internet Explorer 10 icon
IE 11
Safari icon
Safari

Lesson 7 - Transforming XML With XSLT - Chapter 1

Introduction

Congratulations! You've made it halfway through the course. How are you feeling? Think you've mastered XML yet?

Well, today's lesson will demonstrate a new language that I think you're going to find helpful: XSLT.

Using the XSLT (XSL Transformations) language, you can change an XML document into various other kinds of documents such as ordinary text, Web pages (HTML), databases, and even modified XML files.

Say that you decide to add a new element to the cookbook program that specifies how long it takes to prepare each recipe. You probably don't have the time or desire to go through the XML file typing in new <preptime></preptime> tag pairs to every <recipe> element by hand. Let XSLT do the job for you.

You'll also learn how to avoid repetitive code in the cookbook program. Both the Form_Load event and a ShowAllTitles button event do pretty much the same thing: display all the recipe titles in the listbox. So why duplicate the code that does that job?

And you'll explore two ways to avoid this kind of code duplication: have one event Call (trigger) the other event, or create a custom sub of your own that both events can Call to do the job.

Chapter 2

Getting Started With XSLT

Let's consider the human brain for a moment. Thinking involves two main components: information and processing.


Human information is stored in memory or stimuli. And our processing is something we do with information, like remember someone's name and then say, "Hi, John."

Recall that computers are called data processors because they do these same two primary thinking tasks: store data and do something with that data. XML is data, and our VB cookbook program processes the XML data—alphabetizes, deletes, displays, searches it, and so on.


As you've seen in previous lessons, XML is an excellent way to store data because it organizes information in a logical way in a structure of relationships (child, parent, sibling). Also, its plain English tags make it easy to read. This makes it clear what type of content a datum is, and where each datum begins and ends.

But not all data storage methods are as excellent as XML.

Imagine an auto shop's billing system. Suppose somebody comes in for an oil change and this requires three items that you should put on the bill: oil, filter, and wiper blades. The billing system needs to store the name and price of each item. It's clearly impractical to just pack this data together like this:

 Valvoline oil23.15filter12.74wiper blades20

How can the computer tell where one datum ends and another begins? There aren't any XML tags here or any other way to separate the data.

You could write some programming that would scan this data, checking each letter to see if it's a digit or an alphabetic character. The numeric digits would mean price data, and alphabetic characters would mean car parts data.

If you're interested, click this button to see a little VB program that can detect the difference between digits and characters:

But what would happen if the name of the oil was Valvoline 5W-30? When the computer comes upon that 5, it will believe the 5 represents a price—even though it's actually a car part. And there's no practical way to write a program that will make a computer see this distinction.

Most of us humans can quickly see the difference, because we're smarter than computers at this kind of thing—so far anyway. We have the advantage of common sense and experience. Put another way: We've heard of 30-weight oil.

Enter Delimiters

A traditional database feature called delimiters solves this smashed-together-data problem by putting a comma or some other delimiter between data to separate them. Commas are used in one classic technique employed by spreadsheets like Excel and many other programs. It's called comma separated values, or CSV. It looks like this:

Valvoline oil,23.15,filter,12.74,wiper blades,20

This way, the computer doesn't have to try to figure out what's what.

But there's a problem with delimiters: You can't use the delimiter itself in your actual data. This example from our recipe data couldn't be stored in a CSV file, because the computer would wrongly interpret the commas as delimiters:

finely chop 2 medium carrots, 2 medium onions, and 2 celery stalks

In spite of this rather major issue, CSV is still widely used, particularly when storing simple, one-word data or brief phrases in lists or tables.

So someday you might have to transform an XML file into a CSV file. And this is where XSLT comes into play.

Enter XSLT

XSLT specializes in this kind of thing—transforming XML to make it compatible with other kinds of data storage, or even other kinds of XML structures.

We're going to try using XSLT to create a CSV file.

Tip

Tip

XML and XSL Editors

VS includes an editor for working with XSLT documents. You'll likely prefer to use this editor rather than writing your XSLT code in Notepad or some other simple editor, because the VS editor will alert you to certain kinds of errors in your code.

If you open a file with an .xml or .xsl/.xslt filename extension, you automatically open the VS XML editor. Or you can create a new XSLT file in the editor by choosing Project > Add New Item > XSLT. (Programmers use XSL and XSLT interchangeably. You can use either term, and you can use either filename extension: myfile.xsl or myfile.xslt.)

 
New XSLT file
New XSLT file

XSLT uses templates to specify what you want changed in an XML document. As you'll see, many XSLT files employ two templates: the first makes an exact copy of everything in the XML file, and the second one describes how you want to modify that copy (add new elements, delete them, replace all the tags with commas, or whatever).

Think of the first template as a sculptor rolling in a slab of marble, and the second template as carving the statue.

This XML to CSV example consists of three files:

  • XML, the original car parts data
  • XSLT, the file that specifies how you want that data changed from XML into CSV
  • A Form1.vb program that makes these changes by applying the templates in the XSLT code to the original XML data

All right, let's look at the XSLT that changes XML into CSV code.

  1. Using Windows Explorer, double-click the C:\XML L7 Finished folder to open it.
  2. Double-click the XSL folder to open it.
  3. Double-click the VB project file named XSL.sln to open it in VS.
  4. You may or may not see four tabs at the top of the Code window. If you don't, open them by double-clicking their filenames in the Solution Explorer window: Car Parts.xml, XML to CSV.xslt, Rename.xslt, and Form1.vb. (If the Solution Explorer isn't visible, choose View > Solution Explorer.)
 
Three documents and a program
Three documents and a program

Examining the Code

Back to the oil change. Assume that this auto shop's part name/price information is stored in XML. But their billing software uses CSV. Your job is to transform the XML data into a CSV file.

Click the Car Parts.xml tab in the Code editor. You'll see this typical XML document:

In this lesson, we'll transform this data every which way! Take a look at the XSL document that translates our XML into CSV:

Click the XML to CSV.xslt tab in the Code editor to see the code.

In an XSL file that transforms data, you'll find one or more templates. These templates behave like a sub in VB: They do something to data—they process it.

XSL templates usually have two jobs to do:

  • Specifying what to copy from the original document (the entire document, only elements with a particular tag name such as "employee," or some other subset)
  • Specifying how the document should be transformed (delete a particular element, rename an element, strip off attributes, and so on)

In this screen recording, I'll take you through the XSL code and show you how it transforms the data.


Chapter 2, Video 1: "Understanding How XSL Code Transforms XML Documents", TRANSCRIPT

Let's see how this XSL code transforms the data. In this line here, we define the output method as text because CSV files are just basic text files as opposed to HTML or XML. Then the template rule here tells the processor, which, in this case, is VB, to look for the carpart element and change each one into plain text by stripping off the XML tags and replacing them with a simple comma. The Value-of element is like the VB InnerText property. It's an element's actual data minus the tags. For example, the text "wiper blades" is a value of a partname element.

What I find most interesting about this template here is how it strips off all the XML tags. It does this by simply not including them in the template. It uses the Value-of command to extract the data, but it doesn't include those chevron-surrounded tags such as partname or price in the output. Chevrons are these greater- or less-than symbols that we use frequently. In any case, by not including those chevron tags in the output, it's as if your name were left off a list, and as a result, you simply vanished. We'll use this trick later in the lesson to delete elements. How will we do that? We just won't mention those elements in the template and they'll be gone.

But what about the commas? Why are they enclosed in the XSL text commands? The text in an XSL template is inserted into the CSV output that we're creating here as a text literal. In other words, if you type a comma inside a template, you get a comma in the resulting CSV data. If you type three commas and the phrase "hard cheese," you get three commas and "hard cheese." But unless you enclose the commas or some other literal text within the XSL text commands, you'll get an unwanted carriage return, as if you pressed the Enter key, but will be inserted into the CSV file.

END TRANSCRIPT

 
Two commas
Two commas

Now that we've examined the XSL transformation code, it's time to process the data. We'll apply the XSL template's transformation rules to the XML data in the Car Parts.xml file.

Applying the XSL Template's Transformation Rules

Fortunately, VB includes lots of commands you can use to work with XSL. To add those commands, just import VB's XSL library in your first line of code (shown below). Here's the complete VB program that transforms XML into CSV:

This code is straightforward enough. You begin by creating an object variable I named XTransform to hold your XSLT code, and use the Load method to copy the XSLT code into this XTransform variable.

Finally, you use VB's Transform command to apply the template rules to the XML data source file (Car Parts.xml) and save the resulting comma-delimited data to a Car Parts.csv output file. This one command does all the work for you.

Follow these steps to transform the XML into CSV:

  1. Click the Form1.vb tab in the Code editor.
  2. Press F5.
  3. Using Windows Explorer, double-click the C:\XML L7 Finished\XSL folder to open it.
  4. Right-click the Car Parts.csv file.
  5. In the context menu, choose Open With > Notepad.

You'll see these results:


Valvoline oil,23.15,

filter,12.74,

wiper blades,20,

XSL Formatting vs. XSL Transforming

Remember in a previous lesson when you used an XSL file to format XML for display in a browser? That's similar to what you're doing now.

You had a data file (XML) and a separate file (XSL) that described how you wanted that data formatted. Then you submitted both files to a browser like IE that knows how to apply XSL formatting rules to XML data and display the result. Built into browsers is an XSL processor (recall that XSL is often called XSLT when used to transform data, but the terms are used interchangeably).

In this lesson, you used an XSLT file to transform XML data into the CSV format. This is similar to using XSL to format XML data. But there are three major differences:

  • You now use template rules rather than style rules. (You're not formatting content here. You're changing the fundamental data structure of the XML document.)
  • You use a VB program to process the XML and XSLT files, instead of submitting these files to the processor built into IE.
  • You send the results of your processing (the output) to a .csv file. (You're not sending the output to IE to be displayed.)

So changing XML to CSV works like this:

Input XSLT/XML Output CSV
Input XSLT/XML Output CSV

Next let's try another XSL experiment: restructuring XML files and saving the result as a new XML file.

Chapter 3

​Transforming XML to XML

Not all XML transformations require XSLT. Before you launch a new XSLT project, stop and think if there might be an easier way to get the job done.

For example, let's say for some reason you need to rename the <price> element in your xml file (car parts.xml) to <saleprice>. You can write XSLT code that does this transformation. But if your XML document isn't enormous, it might be simpler to just load it into a word processor like Word, and then press CTRL + H to use the Find and Replace feature.

Of course, if you work for social security, your XML document will be far too large to fit into Word, so you'll have to use XSL or write a program in VB or some other language to do the renaming.

But if you're interested, here's one way to use XSLT to rename elements: In the VB Code window, click the Rename.xslt tab to view the XSLT file that I wrote. When you add a new XSLT file in VS (Project > Add New Item > XSLT), the editor automatically adds some default XSL code, called an identity template.

The first template is an identity template, and the highlighted template is the code that I added to Microsoft's default code:

Here's how this code works. Recall that in XSL, a template is code that does something—processes data—like a VB sub. Two templates appear in this code.

The identity template is famous and often used in XSL work. So often, in fact, that VS automatically inserts it whenever you add a new XSLT file in VS. You can delete it, but you'll often want to use it.

The identity template tells our VB program to copy everything as-is from the XML input or source file (car parts.xml) to the output file (which we'll call new parts.xml).

If you only execute the first (identity) template, you'd get an exact copy of the original. But by adding the second template, you're saying, "Do an identical copy, but with these exceptions."

In this example, the second template interferes with the first template by saying, "Wait, change the name of price to saleprice when you copy this data."

So let's sum up what this pair of templates does:

  • The first template copies the whole XML document line-for-line, just like a copy machine.
  • The second template specifies changes you want to make, like a teacher marking up an essay.

To test this renaming transformation, just change the following two highlighted file names in the Form1.vb code tab:

Then press F5 to execute the Form1.vb code. If you look at the New Parts.xml output file, you'll see that <price> was renamed to <saleprice>:

Copying Attributes

The two select="@*|node()" commands aren't strictly necessary for our example, because they copy attributes. Without these commands, the template copies only elements.

But the attribute commands do no harm, and you might work with attributes someday. So it's just as well to leave them in as part of the identity template.  

The @* symbols mean copy attributes, and node() means copy elements (technically node means any node other than attributes or the root element). But in our example, the only nodes are elements. The pipe symbol (|) just separates these specifications.

Adding a New Element

Here's another transform to try. Let's say you want to add a new element named quantity to your XML structure, and say you want to put it just above the existing price element in the XML document.

Click the Add Element.xslt tab in the code window:

The period character that follows the select attribute in this code line <xsl:copy-of select="."/> just means to select the current element's data. If you omit this line, the partname element will be left out of the resulting XML file.

To test this example, you just need to make the usual two changes (highlighted below) to the VB code you used earlier in this lesson. In the Form1.vb tab in the Code window, change the name of the xsl file to Add Element.xslt, and change the name of the output (result) file to New Parts Add.xml:

Press F5 to execute the VB code, and then use Windows Explorer to find and double-click the New Parts Add.xml file. (This output file is located in the c:\XML L7Finished\XSL\ folder, because that's the file path you specified in the VB code above.)

When you open the XML file in Notepad or some other editor, you should see that the transformation added the new <quantity> element.

Deleting Elements

Deleting an element is cool. All you do is ignore it—don't mention it in the second template. For example, if you want to delete the price element, just match that element in the second template, but then do nothing with it in the template, like this:

The second template here is empty. You can see that there's no code to rename or otherwise transform this element.

All right, next up we'll be adding a Show All Recipes button to the cookbook program!

Chapter 4

Showing All the Recipe Titles

It's time to add a button to the cookbook program that, when the user clicks it, it displays all the recipe titles.

I know you're up to solving puzzles, so try this challenge:

XML Challenge!!

XML Challenge!

Take a good look at this FillListboxWithTitle code. Does it seem familiar? Can you figure out where you might have seen it before?

This code is almost identical to the code in the Form_Load event!




You might say, why even bother with a button that displays all the titles, since the program already filled that listbox in the Form_Load event?

True, it did. But there's a special situation where you need to let the user refresh the listbox by hand.

After the user does a search, the listbox displays only those recipes that contain the word or phrase the user searched for. This means that the listbox probably no longer contains all the recipe titles.

In the Search event, we have a line of code that turns the listbox blue to alert the user that he or she is seeing only a subset of the recipes. Here's what it looks like if you search for European:

 
Blue means search result
Blue means search result
lstTitles.BackColor = Color.CornflowerBlue

The user might want to restore the listbox so it shows all the recipe titles again. That's the purpose of the Show All Titles button.

I actually wrote this code twice. The original code is in the XML Challenge! above. But I decided to use the Call technique instead of duplicating all the code in the Form_Load event.

So let's take a look at what happens when the user clicks the Show All Titles button. Here's the code I came up with:

You need to do three things in this sub before calling the Form_Load event:

  • The listbox is colored blue after the user does a search. So here, when the user restores all the titles, you need to change it back to the normal white color.
  • The Form_Load event adds all the titles to the listbox. So you need to delete the titles now displayed. Otherwise, there would be duplicate titles. Use the Clear command to empty the listbox.
  • Finally, you don't want the Search textbox to display the user's last search word or phrase. So you empty that textbox by assigning an empty string " " to it.

Writing Your Own Sub

If you don't like the idea of calling one event from another, there's an alternative technique. You can put the code they have in common in a brand new sub that you write. This sub will hold the code that fills the listbox, so you can just Call the new sub from both the Form_Load and the btnShowAllTitles_Click subs.

We won't use this technique in the final cookbook program, but if you're interested in how to create your own sub, follow these steps:

  1. Find a location in the code window that's not inside some other sub. For example, go down to the bottom, just above the last code line (End Class always has to be the last code line). Click your mouse to put the blinking insertion cursor there.
  2. Copy this code and paste this new sub into the Code window:

You get to name your sub whatever you want, just like naming variables. I chose FillListboxWithTitles.

  1. Call this new sub from the Form_Load and btnShowAllTitles_Click events like this:

Notice the empty parentheses. When you call a sub, you can optionally send information to that sub inside the parentheses, but there's no need to do that here.

Recall that when you call events, you're required to send two pieces of information inside the parentheses:

(sender As Object, e As EventArgs)

So you always include that information when calling an event, like this:

        Call Form1_Load(Me, e)

But a sub you write is not an event like Button_Click or txtSearch_KeyPress. The user doesn't trigger this sub; the program itself executes it.

If you like this approach—creating your own sub to service two or more events that need the same job done—well, you're not alone. You have good programming instincts. Many experts consider this approach a best practice, as they call it—a fundamental principle of good software design. They frown on calling one event from another, describing it as amateurish and problematic. They sometimes even use the terms code smell or goofy.

But it's okay. If you are an amateur, you can still get away with a few shortcuts like this. Do remember, though, that events calling events can indeed cause problems when writing or maintaining more complex projects than our cookbook.

Let's Chat!

Because this is a beginner's course, I'm trying to keep the code concise and simple. That's why I Call the Form_Load event from the Show All Titles button:

        Call Form1_Load(Me, e)

How do you feel about this approach? Many experts say that you should avoid having one event call another. What's your take on this? Go to the Discussion Area, and let me know what you think.

Chapter 5

Summary

In this lesson you explored various ways to use XSL to transform XML data. You saw why you can't simply smash data together—there must be some kind of delimiter between each datum. And you looked at a very simple, yet still popular, delimiter system: CSV, comma separated values.

What else did you learn? Check the boxes below. If there's any item you feel you need to look at more carefully, feel free to go back now and reread!

How to create XSLT in VS.
The benefits of using the VS editor rather than a simple text editor like Notepad.
How to use an XSL transformation file to strip off the tags of an XML document and replace them with commas.
The three major differences between XSL formatting and XSL transforming.
How to modify XML structures by renaming, deleting, and adding elements to an existing XML document.
How to use the invaluable identity template to make a one-for-one copy of an XML document.
How to specify exceptions by filtering the output through a second template.
How to add a Show All Files button that lets the user refresh the listbox of titles after performing a search.
How to avoid repeating code when two events do similar tasks.
Some cautions about software design practices.
A few things to consider if you plan to transition from amateur to professional programming.

In the next lesson, you'll learn how to validate an XML document to ensure that it's not only well-formed (that its tags are in the right places) but also that it adheres to rules defining its entire structure (elements, data types, relationships, attributes, default values, and so on). I'll see you there!

Supplementary Material

http://en.wikibooks.org/wiki/Visual_Basic/Coding_Standards

FAQs

Q: Are "best programming practices" essential?

A: In some cases, yes—if by essential you mean that you'll get into trouble by ignoring them. Some rules have evolved over time because experienced educators and programmers have shown them to be useful. Also, studies have confirmed the value of certain techniques.

Here are a few things that some types of bad coding practices can do:

  • Cause bugs
  • Make code hard to read
  • Create confusion
  • Make modifying the code more difficult
  • Make it difficult for other programmers' code to be merged with yours
  • Make it hard to reuse the code in other programs
  • Make you look foolish and inefficient to other programmers

I say some because not all rules are equally important, and some suggestions are just a matter of personal preference. But do be aware that many professional programmers must work with other programmers on team projects. You do the Import button code, and somebody else does Form_Load, and so on.

Some best practices are designed to keep people from stepping on one another's toes (some Object Oriented Programming techniques are essentially clerical—ways to avoid unintended interactions, version problems, code conflicts, and so on).

So when you work on a team, it's like a relay race: You don't want one of the runners to have personal preferences about which direction to go.


Q:
Couldn't you avoid having to use delimiters if you stored each datum in a fixed, predictable space? Like a set of identical boxes? For example, you could store strings in 60-character spaces, even if some of the strings are shorter. Then the computer would always know where one datum ended and another started, right?

A: Congratulations! Only the brightest students make this suggestion. Indeed data is sometimes stored in fixed-length strings. You just pad any unused area with spaces:


Fixed-length data storage

The fixed-length approach works very well in situations where each datum is the same size. For example, if you were storing five-digit ID numbers or some other data that are always a predictable length, like phone numbers. You can also use it with single-purpose data storage, like a simple table where you define a specific length for each column of data. But this technique—often called a flat-file database—doesn't contain structural relationships between the data as XML does.

People sometimes criticize XML for using up lots of space with all the tags! But computer memory today is so much less expensive than it used to be. And when space is an issue—such as transmitting vast files—modern compression methods are effective (think Zip files).

 

Q: In the Car Parts.xml document, you named the root element carparts and each of its children carpart. Isn't using similar tag names like this going to be confusing?

A: I'm glad you noticed this. It's actually surprisingly common. You'll often come across books/book, employees/employee, and other similar pairs in XML, Object Oriented Programming, and other areas of computing. But in the cookbook program, I took a different approach. I used distinctly different tag names: cookbook/recipe. To me, descriptive distinctions like this are easier to read. Maybe you agree with me that using similar tags for parent/child elements is a bit confusing.

On the other hand, in the real world, you run across similar names all the time. For example, a filing cabinet might be labeled, "Employees," and contain a bunch of individual files named Employee. It does make a kind of sense, and it certainly creates a clear relationship between the tags—so maybe it won't confuse you. (I hope not, because this approach is so widespread!)

But it's great that these things are often a matter of personal choice. So go ahead and name variables, XML elements, and objects using the approach that works best for you.


Q:
You showed us how to transform an XML file into a CSV file. Can it go the other way? Transforming a CSV file into XML?

A: In the digital world, you can change any information into any other information. You've seen videos where a man's face morphs into a woman's.

But using XSL templates to change CSV into XML is surprisingly complicated! It's easier to just write a little VB program to do the job. You'll see how to do this in a future lesson when you create an Import button for the cookbook program that adds recipes to your recipes.xml file from the Windows clipboard. You can add any recipe you find on the Internet with one button click.

There's also another option if you prefer not to write your own program. Google "CSV to XML," and you'll find several sites that will do the conversion for you. Just paste your CSV into the Web page, and then it generates an XML version for you.

Assignment

Try writing an XSLT file in VB that creates a CSV version of the recipes.xml file. (We'll stick with the XML version of the recipes file for the cookbook program, but creating a CSV file on your own will help you master the technique!)

Follow these steps to create a new XSLT file and open the VS XSL editor:

  1. Open VS, and choose the XSL project in the start page.
  2. Create a new XSLT project by choosing Project > Add New Item.
  3. Scroll down the list of common items, and click the XSLT file to select it.
  4. In the Name field, type Recipes to CSV.
  5. Click the Add button.

You now see that VS has added its default code for an XSLT document, including the famous identity template that creates a one-for-one copy of the source XML file:

By default, VS specifies that the output be in the xml format, but we want a comma-separated-value file, and CSV files are plain text.

  1. So change the output method to text:

   <xsl:output method="text" indent="yes"/>

Now you want to add a second template that inserts comma delimiters (and leaves out the XML tags).

2. Add the highlighted second template:

3. Click the Form1.vb tab in the editor, and make the changes highlighted here to modify the file names:

4. Press F5, and then use Windows Explorer to locate the output file: C:\XML L7 Finished\XSL \recipes.csv.

5. Open it in Notepad by right-clicking it, and then choose Open With > Notepad in the context menu.

You'll see that the transformation removed the XML tags and separated the data by commas!